Fast Nearest Neighbor Search on Large Time-Evolving Graphs
نویسندگان
چکیده
Finding the k nearest neighbors (k-nns) of a given vertex in a graph has many applications such as link prediction, keyword search, and image tagging. One robust measure of vertex-proximity in graphs is the Personalized Page Rank (ppr) score based on random walk with restarts. Since ppr scores have long-range correlations, computing them accurately and efficiently is challenging when the graph is too large to fit in memory, especially when it also changes over time. In this work, we propose an efficient algorithm to answer ppr-based k-nn queries in large time-evolving graphs. Our key approach is to use a divide-and-conquer framework and efficiently compute answers in a distributed computing environment. We represent a given graph as a collection of dense vertexclusters with their inter connections. Each vertex-cluster maintains certain information related to internal random walks and updates this information as the graph changes. At query time, we combine this information from a small set of relevant clusters and computes ppr scores efficiently. We validate the effectiveness of our method on large real-world graphs from diverse domains. To the best of our knowledge, this is one of the few works that simultaneously addresses answering k-nn queries in possibly disk-resident and time-evolving graphs.
منابع مشابه
Fast Large-Scale Approximate Graph Construction for NLP
Many natural language processing problems involve constructing large nearest-neighbor graphs. We propose a system called FLAG to construct such graphs approximately from large data sets. To handle the large amount of data, our algorithm maintains approximate counts based on sketching algorithms. To find the approximate nearest neighbors, our algorithm pairs a new distributed online-PMI algorith...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملFLAG: Fast Large-Scale Graph Construction for NLP
Many natural language processing (NLP) problems involve constructing large nearest-neighbor graphs between word pairs by computing distributional similarity between word pairs from large corpora. In this paper, first we describe a system called FLAG to construct such graphs approximately from large data sets. To handle the large amount of data in memory and time efficient manner, FLAG maintains...
متن کاملNearest Neighbor Search in the Metric Space of a Complex Network for Community Detection
The objective of this article is to bridge the gap between two important research directions: (1) nearest neighbor search, which is a fundamental computational tool for large data analysis; and (2) complex network analysis, which deals with large real graphs but is generally studied via graph theoretic analysis or spectral analysis. In this article, we have studied the nearest neighbor search p...
متن کامل